Video-Action Model - yuyan

Video-Action Model

Vision Language Action Model

映像基盤モデル

Vision Language Model

Learning from Video

mimic-video: Video-Action Models for Generalizable Robot Control Beyond VLAs

https://arxiv.org/abs/2512.15692

egoverse

https://egoverse.ai/

VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs

https://huggingface.co/papers/2603.23481

Video-Action Models は長時間タスクで視覚的推論に優れるが、接触が重要な操作では視覚のみでは不十分。